-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42458: Modify variance plane when injecting sources and optionally vary injected light #29
Conversation
@jjbuchanan can you check that the line endings on your files haven't been modified? The diff is showing entire changes of file content even in the headers which haven't changed. |
3b19729
to
47d922d
Compare
@timj I've fixed the line endings, and the diffs look better now. |
Fixed the whitespace and import order issues that were producing flake8 and isort errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, James. I'd still like to run this through a few pipelines to make sure everything makes sense there, but thought I'd give you some initial feedback today.
In addition to the line-by-line stuff, I think you can probably add some unit tests. Things like comparing add_noise=True
with add_noise=False
with otherwise identical inputs.
@@ -72,6 +112,13 @@ class CoaddInjectTask(BaseInjectTask): | |||
_DefaultName = "coaddInjectTask" | |||
ConfigClass = CoaddInjectConfig | |||
|
|||
def run(self, injection_catalogs, input_exposure, psf, photo_calib, wcs): | |||
self.log.info("Fitting flux vs. variance in each pixel.") | |||
self.config.variance_scale = self.get_variance_scale(input_exposure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should pass calculated values through the config. The config gets written to the butler once per collection, so having values that differ (from exposure to exposure) within a collection will get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I can instead pass variance_scale as an argument to BaseInjectTask.run(), on the same footing as things like photo_calib.
@@ -82,3 +129,124 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs): | |||
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"] | |||
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys}) | |||
butler_quantum_context.put(outputs, output_refs) | |||
|
|||
def get_variance_scale(self, exposure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a docstring here explaining the basic algorithmic idea. Why there are two rounds of RANSAC, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with one. The algorithm has multiple parts that need explaining so it's a little verbose, but hopefully okay.
@@ -82,3 +129,124 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs): | |||
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"] | |||
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys}) | |||
butler_quantum_context.put(outputs, output_refs) | |||
|
|||
def get_variance_scale(self, exposure): | |||
x = exposure.image.array.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these variables are large enough in scope to get spelled out. Maybe flux
and var
or variance
, but not x
and y
.
Also, I think .flatten
will always produce a copy, where .ravel
will only copy when necessary. So as long as these are read-only (I think that's right?) .ravel
might be a tad faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the variable names.
Interesting point about flatten vs. ravel. I'll go with ravel, while noting that I'm also changing the way that nan pixels are handled.
bad_pixels = ~np.isfinite(x) | ~np.isfinite(y) | ||
# Replace bad pixel values with the image median | ||
if np.sum(bad_pixels) > 0: | ||
median_image_value = np.median(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably needs to be np.nanmedian
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of replacing nan pixels with the median value, I think it may be best to just ignore them, by assigning flux = flux[good_pixels]
.
kmeans.labels_ == np.argmax(label_counts) | ||
] | ||
if len(stable_fit_seeds == 0): | ||
# Throw a warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to find an appropriate warning to raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented, and another one at the very end if the final slope ends up being negative.
|
||
# If the smooth image can be drawn successfully, | ||
# we can do everything else. | ||
if draw_succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the try/except either succeed, continue (on GalSimFFTSizeError), or halt (on any other error)?
I think this is always true, so unneeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
if add_noise: | ||
# For generating noise, | ||
# variance must be meaningful. | ||
if np.sum(variance_template.array < 0) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing np.any
is a hair faster. It has the possibility to short-circuit for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@@ -104,6 +108,13 @@ class VisitInjectTask(BaseInjectTask): | |||
_DefaultName = "visitInjectTask" | |||
ConfigClass = VisitInjectConfig | |||
|
|||
def run(self, injection_catalogs, input_exposure, psf, photo_calib, wcs): | |||
self.log.info("Fitting flux vs. variance in each pixel.") | |||
self.config.variance_scale = self.get_variance_scale(input_exposure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as earlier, I don't think we can pass calculated values through the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as noted above.
@@ -128,3 +139,35 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs): | |||
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"] | |||
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys}) | |||
butler_quantum_context.put(outputs, output_refs) | |||
|
|||
def get_variance_scale(self, exposure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
y[bad_pixels] = median_variance_value | ||
|
||
# Only fit pixels with at least this much inst flux | ||
brightness_cutoff = 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest that this threshold should be stored and thus accessible in the config for the Task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do.
ransac = RANSACRegressor( | ||
loss="squared_error", | ||
residual_threshold=self.config.threshold_scale_1 * linear_mse, | ||
max_trials=1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this hard-coded value for max_trials
be accessible at the function keyword level, or even the Task config level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to put it at the function keyword level at least, and I'm fine with putting it in the Task config if people are happy with the proliferation of config options. Ideally this value should be high enough so that it's never the thing that causes a pathology in practice, and it seems like scikit-learn's default value is a little low, which is why I specify a different value here.
# K-means cluster the first round of fits, | ||
# to find the most stable results. | ||
kmeans = KMeans( | ||
n_clusters=self.config.n_clusters_1, random_state=self.config.kmeans_seed_1, n_init=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should n_init be accessible at the Task config level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. I can put it there.
No description provided.